How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN

نویسندگان

چکیده

Abstract Current language models can generate high-quality text. Are they simply copying text have seen before, or learned generalizable linguistic abstractions? To tease apart these possibilities, we introduce RAVEN, a suite of analyses for assessing the novelty generated text, focusing on sequential structure (n-grams) and syntactic structure. We apply to four neural trained English (an LSTM, Transformer, Transformer-XL, GPT-2). For local structure—e.g., individual dependencies—text with standard sampling scheme is substantially less novel than our baseline human-generated from each model’s test set. larger-scale overall sentence structure—model-generated as even more baseline, but still sometimes copy substantially, in some cases duplicating passages over 1,000 words long training also perform extensive manual analysis, finding evidence that GPT-2 uses both compositional analogical generalization mechanisms showing GPT-2’s usually well-formed morphologically syntactically has reasonably frequent semantic issues (e.g., being self-contradictory).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Generative Models for Text Generation

Generating human quality text is a challenging problem because of ambiguity of meaning and difficulty in modeling long term semantic connections. Recurrent Neural Networks (RNNs) have shown promising results in this problem domain, with the most common approach to its training being to maximize the log predictive likelihood of each true token in the training sequence given the previously observ...

متن کامل

How Much Training Is Too Much?

Managing training practices in elite performance domains is recognised to play an important role in preventing musculoskeletal overload, and hence reducing the risk of overuse-related injuries. In international studies spanning four decades, the duration of playing, especially in combination with sudden increases in playing and inadequate rest breaks, remains one of the most common causes of in...

متن کامل

Evaluating historical text normalization systems: How well do they generalize?

We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practice—i.e., for new datasets or languages; in comparison to more naïve systems; or as a preprocessing step for downstream NLP tools. We illustrate these issues and exemplify our proposed evaluation practices by comparing two neural mod...

متن کامل

How much do you believe?

This paper responds to a number of criticisms of Dempster-Shafer theory made by Judea Pearl. He criticises Dempster-Shafer belief for not obeying the laws of Bayesian belief," however, these laws lead to well-known problems in the face of ignorance, and seem unreasonably restrictive. It is argued that it is not reasonable to expect a measure of belief to obey Pearl's sandwich principle. The sta...

متن کامل

How much do Workers Search ?

In this paper, I consider four determinants of wages: productivity, workers’ bargaining power, competition between employers due to on-the-job search, and search intensity by workers. Workers can increase their job offer arrival rate through costly search. Employers take into consideration the search intensity choices of their employees when the two parties jointly set wages. Using a Nash barga...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2023

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00567